Goto

Collaborating Authors

 collateral damage


Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization

Yang, Guangmingmei, Miller, David J., Kesidis, George

arXiv.org Artificial Intelligence

Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. However, these approaches may fail: (1) when some (non-target) classes are easily discriminable from all others, in which case they may naturally achieve extreme detection statistics (e.g., decision confidence); and (2) when the backdoor is subtle, i.e., with its features weak relative to intrinsic class-discriminative features. A key observation is that the backdoor target class has contributions to its detection statistic from both the backdoor trigger and from its intrinsic features, whereas non-target classes only have contributions from their intrinsic features. To achieve more sensitive detectors, we thus propose to suppress intrinsic features while optimizing the detection statistic for a given class. For non-target classes, such suppression will drastically reduce the achievable statistic, whereas for the target class the (significant) contribution from the backdoor trigger remains. In practice, we formulate a constrained optimization problem, leveraging a small set of clean examples from a given class, and optimizing the detection statistic while orthogonalizing with respect to the class's intrinsic features. We dub this plug-and-play approach Class Subspace Orthogonalization (CSO) and assess it against challenging mixed-label and adaptive attacks.


No Free Lunch in Language Model Bias Mitigation? Targeted Bias Reduction Can Exacerbate Unmitigated LLM Biases

Chand, Shireen, Baca, Faith, Ferrara, Emilio

arXiv.org Artificial Intelligence

Large Language Models (LLMs) inherit societal biases from their training data, potentially leading to harmful or unfair outputs. While various techniques aim to mitigate these biases, their effects are often evaluated only along the dimension of the bias being targeted. This work investigates the cross-category consequences of targeted bias mitigation. We study four bias mitigation techniques applied across ten models from seven model families, and we explore racial, religious, profession- and gender-related biases. We measure the impact of debiasing on model coherence and stereotypical preference using the StereoSet benchmark. Our results consistently show that while targeted mitigation can sometimes reduce bias in the intended dimension, it frequently leads to unintended and often negative consequences in others, such as increasing model bias and decreasing general coherence. These findings underscore the critical need for robust, multi-dimensional evaluation tools when examining and developing bias mitigation strategies to avoid inadvertently shifting or worsening bias along untargeted axes.


Collateral Damage Assessment Model for AI System Target Engagement in Military Operations

Maathuis, Clara, Cools, Kasper

arXiv.org Artificial Intelligence

Abstract--In an era where AI (Artificial Intelligence) systems play an increasing role in the battlefield, ensuring responsible targeting demands rigorous assessment of potential collateral effects. In this context, a novel collateral damage assessment model for target engagement of AI systems in military operations is introduced. Its layered structure captures the categories and architectural components of the AI systems to be engaged together with corresponding engaging vectors and contextual aspects. At the same time, spreading, severity, likelihood, and evaluation metrics are considered in order to provide a clear representation enhanced by transparent reasoning mechanisms. Further, the model is demonstrated and evaluated through instantiation which serves as a basis for further dedicated efforts that aim at building responsible and trustworthy intelligent systems for assessing the effects produced by engaging AI systems in military operations.


Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack

Ha, SeungBum, Park, Saerom, Yoon, Sung Whan

arXiv.org Artificial Intelligence

Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots: "over-unlearning" that deteriorates retained data near the forget set, and post-hoc "relearning" attacks that aim to resurrect the forgotten knowledge. We first derive the over-unlearning metric OU@ε, which represents the collateral damage to the nearby region of the forget set, where the over-unlearning mainly appears. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots, we introduce Spotter, a plug-and-play objective that combines (i) a masked knowledge-distillation penalty on the nearby region of forget set to suppress OU@ε, and (ii) an intra-class dispersion loss that scatters forget-class embeddings, neutralizing prototypical relearning attacks. On CIFAR-10, as one of validations, Spotter reduces OU@εby below the 0.05X of the baseline, drives forget accuracy to 0%, preserves accuracy of the retain set within 1% of difference with the original, and denies the prototype-attack by keeping the forget set accuracy within <1%, without accessing retained data. It confirms that Spotter is a practical remedy of the unlearning's blind spots.


UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

Patil, Vaidehi, Stengel-Eskin, Elias, Bansal, Mohit

arXiv.org Artificial Intelligence

User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.


Efficacy of Full-Packet Encryption in Mitigating Protocol Detection for Evasive Virtual Private Networks

Parker, Amy Iris

arXiv.org Artificial Intelligence

Full-packet encryption is a technique used by modern evasive Virtual Private Networks (VPNs) to avoid protocol-based flagging from censorship models by disguising their traffic as random noise on the network. Traditional methods for censoring full-packet-encryption based VPN protocols requires assuming a substantial amount of collateral damage, as other non-VPN network traffic that appears random will be blocked. I tested several machine learning-based classification models against the Aggressive Circumvention of Censorship (ACC) protocol, a fully-encrypted evasive VPN protocol which merges strategies from a wide variety of currently in-use evasive VPN protocols. My testing found that while ACC was able to survive our models when compared to random noise, it was easily detectable with minimal collateral damage using several different machine learning models when within a stream of regular network traffic. While resistant to the current techniques deployed by nation-state censors, the ACC protocol and other evasive protocols are potentially subject to packet-based protocol identification utilizing similar classification models.


OpenVPN Is Open to VPN Fingerprinting

Communications of the ACM

VPN adoption has seen steady growth over the past decade due to increased public awareness of privacy and surveillance threats. In response, certain governments are attempting to restrict VPN access by identifying connections using "dual use" DPI technology. To investigate the potential for VPN blocking, we develop mechanisms for accurately fingerprinting connections using OpenVPN, the most popular protocol for commercial VPN services. We identify three fingerprints based on protocol features such as byte pattern, packet size, and server response. Playing the role of an attacker who controls the network, we design a two-phase framework that performs passive fingerprinting and active probing in sequence.


'The machine did it coldly': Israel used AI to identify 37,000 Hamas targets

The Guardian

The Israeli military's bombing campaign in Gaza used a previously undisclosed AI-powered database that at one stage identified 37,000 potential targets based on their apparent links to Hamas, according to intelligence sources involved in the war. In addition to talking about their use of the AI system, called Lavender, the intelligence sources claim that Israeli military officials permitted large numbers of Palestinian civilians to be killed, particularly during the early weeks and months of the conflict. Their unusually candid testimony provides a rare glimpse into the first-hand experiences of Israeli intelligence officials who have been using machine-learning systems to help identify targets during the six-month war. Israel's use of powerful AI systems in its war on Hamas has entered uncharted territory for advanced warfare, raising a host of legal and moral questions, and transforming the relationship between military personnel and machines. "This is unparalleled, in my memory," said one intelligence officer who used Lavender, adding that they had more faith in a "statistical mechanism" than a grieving soldier.


Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?

Bai, Andrew, Yeh, Chih-Kuan, Hsieh, Cho-Jui, Taly, Ankur

arXiv.org Artificial Intelligence

Fine-tuning pretrained foundational models on specific tasks is now the de facto approach for text and vision tasks. A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning. Rehearsing samples randomly from the pretrain dataset is a common approach to alleviate such forgetting. However, we find that random mixing unintentionally includes samples which are not (yet) forgotten or unlearnable by the model. We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting, which we call collateral damage. Since directly identifying collateral damage samples is computationally expensive, we propose a procedure to estimate the distribution of such samples by tracking the statistics of finetuned samples. Our approach is lightweight, easy to implement, and can be seamlessly integrated into existing models, offering an effective means to retain pretrain performance without additional computational costs.


Israel's use of AI in Hamas war can help limit collateral damage 'if executed properly,' expert says

FOX News

The Israel Defense Forces (IDF) have used artificial intelligence (AI) to improve targeting of Hamas operators and facilities as its military faces criticism for what's been deemed as collateral damage and civilian casualties. "I can't predict how long the Gaza operation will take, but the IDF's use of AI and Machine Learning (ML) tools can certainly assist in the administratively burdensome targeting identification, evaluation and assessment process," Mark Montgomery, a senior fellow at the Foundation for Defense of Democracies' Center on Cyber and Technology Innovation, told Fox News Digital. "Similar to U.S. forces, the IDF takes great effort to reduce collateral damage and civilian casualties, and tools like AI and ML can make the targeting process more agile and executable," Montgomery added. "AI tools should help in target identification efforts, expediting target review and approval," he said. "There will inevitably still be humans in the targeting process but in a much accelerated timeline."